{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Centrality in networks\n",
    "### \"Which are the most important [central] vertices in the network?\"\n",
    "(by Fiona Pigott)\n",
    "\n",
    "A centrality metric is a measure on a network intended to rank vertices by importance, and different centrality measures reflect different notions of the \"importance\" of a vertex. \n",
    "\n",
    "Because the idea of the \"importance\" of a vertex depends on the problem at hand, it's important to note that different centrality measures differ partly because they are measuring different things, and the selection of a centrality metric should depend on exacty what you are trying to quantify.\n",
    "\n",
    "This notebook looks long, but half of it is plotting code (sorry). It is intended to illustrate Chapter 7, sections 7.1-7.7 of M. Newman's Networks: An Introduction. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# handy graph library for python\n",
    "import igraph\n",
    "# science\n",
    "import numpy as np\n",
    "from collections import defaultdict\n",
    "# plot things\n",
    "import tabulate\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# get some toy graph data so we can demonstrate these properties\n",
    "network = igraph.Nexus.get(\"kaptail\")['KAPFTI1']\n",
    "network.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# visualize the graph. if you can't get Cairo to work, don't worry about it.\n",
    "# if you are worried about it anyway, google \"cairo py2cairo\" and sort this mess out for yourself\n",
    "igraph.plot(network)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### A quick graph vocabulary refresher\n",
    "- **adjacency matrix**: A (square) matrix $A$ with one row/col per node in the network. An entry of 1 at the location $(i,j)$ indicates an edge from node $i$ to node $j$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Using canned methods from igraph won't be very illustrative, so for the most part, I'll use the adjacency matrix\n",
    "adj_mat = np.array(network.get_adjacency().data)\n",
    "num_nodes = adj_mat.shape[0]\n",
    "fig, axes = plt.subplots(1, figsize = (5,5))\n",
    "axes.set_title(\"Adjacency Matrix (black = edge)\")\n",
    "axes.matshow(adj_mat, cmap = 'gist_heat_r')\n",
    "\n",
    "# these are just constants that we'll want to use\n",
    "# Get some 1s we'll need repeatedly\n",
    "ones = np.ones((num_nodes, 1))\n",
    "normed_ones = np.divide(ones, np.linalg.norm(ones))\n",
    "zeros = np.zeros((num_nodes, 1))\n",
    "identity = np.eye(num_nodes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- **degree**: $k_i$, the number of edges connected to a given node $i$\n",
    "- **in-degree**: $k^{in}_i$, the number of directed edges pointing to a given node in a directed graph, the column sum of the adjacency matrix ($k^{in}_i = \\sum_{j=0}^{n} A_{ji}$)\n",
    "- **out-degree**: $k^{out}_i$, the number of directed edges originating from a given node in a directed graph, the row sum of the adjacency matrix ($k^{out}_i = \\sum_{j=0}^{n} A_{ij}$)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# total number of nodes in the graph = size of the adjacency matrix\n",
    "num_nodes = adj_mat.shape[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# calculate the in- and out- degree\n",
    "outdegree = np.sum(adj_mat,1)\n",
    "indegree = np.sum(adj_mat,0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Degree centrality\n",
    "#### The most important vertex has the highest degree\n",
    "One possible measure of importance is simply vertex degree. Degree is a simple metric, and it's easy to compute because it depends only on the immediate neighborhood of each vertex.\n",
    "\n",
    "Even something as simple as \"degree centrality,\" however, can have subtleties. For instance, is the most important vertex the one with the most incoming edges or the most outgoing edges? This would depend a lot on exactly what you are trying to measure."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# graph the vertices ordered by in + out degree\n",
    "names_and_degrees = {network.vs[\"name\"][i]: {\"in\": indegree[i], \"out\": outdegree[i]} for i in range(0,num_nodes)}\n",
    "sorted_names_and_degrees = sorted(names_and_degrees.items(), key = lambda x: -1*(x[1][\"in\"] + x[1][\"out\"]))\n",
    "sorted_indegree = [x[1][\"in\"] for x in sorted_names_and_degrees]\n",
    "sorted_outdegree = [x[1][\"out\"] for x in sorted_names_and_degrees]\n",
    "sorted_names = [x[0] for x in sorted_names_and_degrees]\n",
    "fig, axes = plt.subplots(3,1, figsize = (16,20))\n",
    "plt.subplots_adjust(bottom = .1)\n",
    "axes[0].set_title(\"Degree for each node\", size = 16)\n",
    "axes[0].bar([x + .4 for x in range(0,num_nodes)], sorted_indegree, color = 'darkorange', width = .8)\n",
    "axes[0].bar([x + .4 for x in range(0,num_nodes)], sorted_outdegree, bottom = sorted_indegree, color = 'navajowhite', width = .8)\n",
    "axes[0].legend([\"in-degree\", \"out-degree\"])\n",
    "axes[0].set_ylim(0,max(outdegree + indegree) + 1)\n",
    "axes[0].set_ylabel(\"degree\", size = 16)\n",
    "axes[0].set_xticks([x + .9 for x in range(0,num_nodes)])\n",
    "_ = axes[0].set_xticklabels(sorted_names, rotation = 90)\n",
    "# graph the vertices sorted by the indegree\n",
    "sorted_by_indegrees = sorted(names_and_degrees.items(), key = lambda x: -1*(x[1][\"in\"]))\n",
    "axes[1].bar([x + .4 for x in range(0,num_nodes)], [x[1][\"in\"] for x in sorted_by_indegrees], color = 'darkorange', width = .8)\n",
    "axes[1].set_ylim(0,max(outdegree + indegree) + 1)\n",
    "axes[1].set_ylabel(\"indegree\", size = 16)\n",
    "axes[1].legend([\"in-degree\"])\n",
    "axes[1].set_xticks([x + .9 for x in range(0,num_nodes)])\n",
    "_ = axes[1].set_xticklabels([x[0] for x in sorted_by_indegrees], rotation = 90)\n",
    "# graph the vertices ordered by the out degree\n",
    "sorted_by_outdegrees = sorted(names_and_degrees.items(), key = lambda x: -1*(x[1][\"out\"]))\n",
    "axes[2].bar([x + .4 for x in range(0,num_nodes)], [x[1][\"out\"] for x in sorted_by_outdegrees], color = 'navajowhite', width = .8)\n",
    "axes[2].set_ylim(0,max(outdegree + indegree) + 1)\n",
    "axes[2].set_xlabel(\"vertex\", size = 16)\n",
    "axes[2].set_ylabel(\"indegree\", size = 16)\n",
    "axes[2].legend([\"out-degree\"])\n",
    "axes[2].set_xticks([x + .9 for x in range(0,num_nodes)])\n",
    "_ = axes[2].set_xticklabels([x[0] for x in sorted_by_outdegrees], rotation = 90)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# I'm also going to compute total degree, scaled from 0 to 1, so that I can compare with other centrality metrics later\n",
    "max_degree = float(max([x[1][\"in\"] + x[1][\"out\"] for x in sorted_names_and_degrees]))\n",
    "scaled_degree = {x[0]: (x[1][\"in\"] + x[1][\"out\"])/max_degree for x in sorted_names_and_degrees}\n",
    "sorted_degree_centrality = sorted(scaled_degree.items(), key = lambda x:-x[1])\n",
    "#max_degree = float(max([x[1][\"in\"] for x in sorted_names_and_degrees]))\n",
    "#scaled_degree = {x[0]: (x[1][\"in\"])/max_degree for x in sorted_names_and_degrees}\n",
    "#sorted_degree_centrality = sorted(scaled_degree.items(), key = lambda x:-x[1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Eigenvector Centrality\n",
    "#### The most important vertex has important neighbors\n",
    "Eigenvector centrality measures network importance of a vertex as a sum of the importance of its neighbors. You can think of this as each vertex \"voting\" for each neighbor (vertex $i$ votes for vertex $j$ if there is a path between $i$ and $j$, i.e., if $A_{ij} = 1$).\n",
    "\n",
    "Iteratively, we can then solve for eigenvector centrality:\n",
    "$$ x'_{i} = \\sum_j A_{ij}x_j$$\n",
    "On each iteration, estimates of $x'$ are updated by the previous weights. You might recognize $\\sum_j A_{ij}x_j$ as the definition of the matrix multiplication $\\mathbf{A}\\mathbf{x}$.\n",
    "\n",
    "Essentially we are solving for the fixed point (where $\\mathbf{x}$ is fixed, save some constant factor) of $\\mathbf{A}\\mathbf{x} = \\lambda\\mathbf{x}$: the equation for the eigenvectors of $\\mathbf{A}$. The vector $\\mathbf{x}$ is in fact the eigenvector corresponding to the largest eigenvalue of $\\mathbf{A}$.\n",
    "\n",
    "#### A slightly longer explanation of that last sentence\n",
    "I glossed over the fact that $\\mathbf{x}$ is specifically the *principal* eigenvector of $\\mathbf{A}$. \n",
    "\n",
    "From *Networks* Ch 7: \n",
    "\n",
    "Many iterations of the equation $\\mathbf{x}' = \\mathbf{A}\\mathbf{x}$ gives $\\mathbf{x}(t) = \\mathbf{A}^t\\mathbf{x}(0)$. \n",
    "If $x(0)$ (some initial guess for the centralities) can be written as a a linear combination of the eigenvectors ($\\mathbf{v_1}$) of $\\mathbf{A}$ (this is only guaranteed if $\\mathbf{A}$ is full-rank, which is true *iff* the graph is completely connected), we can rewrite $\\mathbf{x}(0)$ as that combination $\\mathbf{x}(0) = \\sum_{i}c_i\\mathbf{v_i}$. Then:\n",
    "$$ \\mathbf{\\vec{x}}(t) =\\mathbf{A}^t \\sum_{i}c_i\\mathbf{v_i} = \\sum_{i}\\lambda_i^t c_i\\mathbf{v_i} $$\n",
    "Multipy by 1 in the form of $\\frac{\\lambda_1^t}{\\lambda_1^t}$, where $\\lambda_1$ is the principal (largest) eigenvalue:\n",
    "$$ = {\\lambda_1}^t \\sum_{i} \\left(\\frac{\\lambda_i}{\\lambda_1}\\right)^t c_i\\mathbf{v_i} $$\n",
    "Then, because $\\frac{\\lambda_i}{\\lambda_1}$ is always less than 1, as the number of iterations $t \\rightarrow \\infty, \\left(\\frac{\\lambda_i}{\\lambda_1}\\right)^t \\rightarrow 0 \\; \\forall \\; i \\neq 1$ \n",
    "\n",
    "$$ = {\\lambda_1}^t  c_1\\mathbf{v_1} $$\n",
    "\n",
    "So $\\mathbf{x} \\propto \\mathbf{v_1}$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# set some initial guess for the centralities. giving every node the same value is a good start\n",
    "eigenvector_centrality = normed_ones\n",
    "# give an initial error value and a tolerance to stop iterating \n",
    "err = 100\n",
    "tol = .01\n",
    "while err > tol:\n",
    "    # calculate x' with Ax\n",
    "    eigenvector_centrality_new_unnormed = np.dot(adj_mat, eigenvector_centrality)\n",
    "    # norm your x values (only the proportions matter, if you don't normalize the values blow up)\n",
    "    eigenvector_centrality_new = np.divide(eigenvector_centrality_new_unnormed, \n",
    "                                       np.linalg.norm(eigenvector_centrality_new_unnormed))\n",
    "    # calculate the error\n",
    "    err = sum(abs(np.subtract(eigenvector_centrality_new, eigenvector_centrality)))\n",
    "    # set the new centrality vector\n",
    "    eigenvector_centrality = eigenvector_centrality_new\n",
    "\n",
    "# sort and scale the centrality metric from 0 to 1\n",
    "sorted_eigenvector_centrality = sorted(zip(network.vs[\"name\"], \n",
    "                                           [x/max(eigenvector_centrality) for x in eigenvector_centrality]), \n",
    "                                           key = lambda x: -x[1])\n",
    "dict_eigenvector_centrality = dict(sorted_eigenvector_centrality)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# plot things\n",
    "fig, axes = plt.subplots(2,1, figsize = (16,12))\n",
    "plt.subplots_adjust(bottom = .1)\n",
    "axes[0].set_title(\"Eigenvector Centrality\", size = 16)\n",
    "axes[0].bar([x + .4 for x in range(0,num_nodes)], [x[1] for x in sorted_eigenvector_centrality], \n",
    "         color = 'royalblue', width = .8)\n",
    "axes[0].set_ylabel(\"Eigenvector Centrality Values\", size = 16)\n",
    "axes[0].set_xticks([x + .9 for x in range(0,num_nodes)])\n",
    "_ = axes[0].set_xticklabels([x[0] for x in sorted_eigenvector_centrality], rotation = 90)\n",
    "node_ordering = sorted_degree_centrality #sorted_eigenvector_centrality\n",
    "axes[1].plot(range(0,num_nodes), [dict_eigenvector_centrality[x[0]] for x in node_ordering], color = 'royalblue')\n",
    "axes[1].plot(range(0,num_nodes), [scaled_degree[x[0]] for x in node_ordering], color = 'navajowhite')\n",
    "axes[1].plot(range(0,num_nodes), [dict_eigenvector_centrality[x[0]] for x in node_ordering], color = 'royalblue', marker = 'o')\n",
    "axes[1].plot(range(0,num_nodes), [scaled_degree[x[0]] for x in node_ordering], color = 'navajowhite', marker = 'o', alpha = .5)\n",
    "axes[1].set_ylabel(\"Eigenvector Centrality vs Degree\", size = 16)\n",
    "axes[1].set_xticks([x for x in range(0,num_nodes)])\n",
    "axes[1].grid()\n",
    "axes[1].set_xlim(-0.5,38.5)\n",
    "axes[1].set_ylim(0,1.05)\n",
    "axes[1].legend([\"eigenvector centrality\", \"degree centrality\"])\n",
    "_ = axes[1].set_xticklabels([x[0] for x in node_ordering], rotation = 90)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# matrix formulation. this is only guaranteed to work properly if the matrix is of full rank\n",
    "# (not true in a disconnected graph)\n",
    "# get the eigenvalues and vectors\n",
    "e_vals, e_vecs = np.linalg.eig(adj_mat)\n",
    "# find the principal eigenvector\n",
    "largest_eval_at = e_vals.argsort()[-1]\n",
    "# cetralities proportional to the principal eigenvector, barring some (potentially negative) constant factor\n",
    "matrix_eigen = sorted(zip(network.vs[\"name\"], e_vecs[:,largest_eval_at]), key = lambda x: -1*abs(x[1]))\n",
    "print tabulate.tabulate([[x[0], np.real(x[1])] for x in matrix_eigen][0:8])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One potential problem with eigenvector cetrality is that only vertices in the strongly conencted component (vertices from which you can travel to any place on the network, and to which you can travel from anywhere on the network) can have a non-zero eigenvector centrality.\n",
    "\n",
    "Below is an example of a graph which has no strongly connected components greater than 1 vertex, and so eigenvector centrality breaks down:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "tree = igraph.Graph(directed = True)\n",
    "tree.add_vertices([0,1,2,3,4,5,6])\n",
    "tree.add_edges([(0,1),(0,2),(1,3),(1,4),(2,5),(2,6)])\n",
    "tree.vs[\"label\"] = [0,1,2,3,4,5,6]\n",
    "igraph.plot(tree, layout = tree.layout(\"sugiyama\"), bbox=(0, 0, 150, 150), vertex_label_size = 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# I'll use the built-in eigenvector centrality here for brevity, but it's the same algorithm\n",
    "tree.eigenvector_centrality()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Katz Centrality\n",
    "#### The most important vertex has important neighbors, and every neighbor is important (at least a little)\n",
    "We can get meanigful centrality values for the above case if we assign every node at least a little centrality, rather than having these weakly connected or disconnected vertices get a centrality of zero. The iterative centrality equation becomes:\n",
    "$$ x_{i}' = \\alpha \\sum_j A_{ij}x_j + \\beta $$\n",
    "Where that $\\beta$ term is the centrality that every node gets.\n",
    "\n",
    "This can be rewritted as matrices:\n",
    "$$ \\mathbf{x} = \\alpha \\mathbf{A}\\mathbf{x} + \\beta \\mathbf{1} $$\n",
    "\n",
    "Which can be solved for $\\mathbf{x}$:\n",
    "$\\mathbf{x} = (\\mathbf{I} - \\alpha \\mathbf{A})^{-1}\\beta\\mathbf{1} \\;$ ($\\beta$ can simply be set to 1, since it's simply a multiplicative constant)\n",
    "\n",
    "Now, note that $\\alpha$ is a free parameter, where $\\alpha = 0$ would make the Katz centrality $\\beta\\mathbf{1}$. The larger $\\alpha$ is the greater emphasis is put on the eigenvctor term, but $\\alpha$ cannot be arbitraily large. The term $(\\mathbf{I} - \\alpha \\mathbf{A})^{-1}$ diverges when $\\text{det}(\\mathbf{I} - \\alpha \\mathbf{A}) = 0$, and it is necessary to chose a value of $\\alpha$ less than this value. Below is an illustration of how the determinant depends on $\\alpha$.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "alpha_values = np.arange(0.219,.4,.0005)\n",
    "det_values = [np.linalg.det(np.subtract(adj_mat, 1.0/a * np.eye(adj_mat.shape[0]))) for a in alpha_values]\n",
    "f, axes = plt.subplots(1,2, figsize = (14,5))\n",
    "axes[0].plot(alpha_values, [0 for a in alpha_values], 'k')\n",
    "axes[1].plot(alpha_values, [0 for a in alpha_values], 'k')\n",
    "axes[0].plot(alpha_values, det_values)\n",
    "axes[1].plot([0.21988], [0], 'ro')\n",
    "axes[1].plot(alpha_values, det_values)\n",
    "xlim0 = axes[0].set_xlim(0.219,.32)\n",
    "xlim1 = axes[1].set_xlim(0.219,.23)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "alpha_index = np.where(np.array(det_values) > 0)[0][0] - 1\n",
    "katz_alpha = alpha_values[alpha_index] * .9\n",
    "print \"The chosen alpha value is: \", katz_alpha"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Calculate Katz centrality with the chosen $\\alpha$ using the iterative formulation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# iterative formulation\n",
    "katz_centrality = zeros\n",
    "err = 100\n",
    "tol = .01\n",
    "while err > tol:\n",
    "    katz_centrality_new = katz_alpha * np.dot(adj_mat, katz_centrality) + ones\n",
    "    err = sum(abs(np.subtract(katz_centrality_new, katz_centrality)))\n",
    "    katz_centrality = katz_centrality_new\n",
    "sorted_katz_centrality = sorted(zip(network.vs[\"name\"], \n",
    "                                                [x[0]/max(katz_centrality)[0] for x in katz_centrality]), \n",
    "                                key = lambda x: -x[1])\n",
    "dict_katz_centrality = dict(sorted_katz_centrality)\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "fig, axes = plt.subplots(2,1, figsize = (16,12))\n",
    "plt.subplots_adjust(bottom = .1)\n",
    "# plot katz centrality\n",
    "axes[0].set_title(\"Katz Centrality\", size = 16)\n",
    "axes[0].bar([x + .4 for x in range(0,num_nodes)], [x[1] for x in sorted_katz_centrality], \n",
    "         color = 'limegreen', width = .8)\n",
    "axes[0].set_ylabel(\"Katz Centrality Values\", size = 16)\n",
    "axes[0].set_xticks([x + .9 for x in range(0,num_nodes)])\n",
    "_ = axes[0].set_xticklabels([x[0] for x in sorted_katz_centrality], rotation = 90)\n",
    "# compare with other measures\n",
    "fade = .3\n",
    "node_ordering = sorted_degree_centrality #sorted_katz_centrality\n",
    "axes[1].plot(range(0,num_nodes), [dict_katz_centrality[x[0]] for x in node_ordering], color = 'limegreen')\n",
    "axes[1].plot(range(0,num_nodes), [dict_eigenvector_centrality[x[0]] for x in node_ordering], color = 'royalblue', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [scaled_degree[x[0]] for x in node_ordering], color = 'navajowhite', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [dict_katz_centrality[x[0]] for x in node_ordering], color = 'limegreen', marker = 'o')\n",
    "axes[1].plot(range(0,num_nodes), [dict_eigenvector_centrality[x[0]] for x in node_ordering], color = 'royalblue', marker = 'o', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [scaled_degree[x[0]] for x in node_ordering], color = 'navajowhite', marker = 'o', alpha = fade)\n",
    "axes[1].set_ylabel(\"Katz vs Eigenvector vs Degree\", size = 16)\n",
    "axes[1].set_xticks([x for x in range(0,num_nodes)])\n",
    "axes[1].grid()\n",
    "axes[1].set_xlim(-0.5,38.5)\n",
    "axes[1].set_ylim(0,1.05)\n",
    "axes[1].legend([\"katz centrality\", \"eigenvector centrality\", \"degree centrality\"])\n",
    "_ = axes[1].set_xticklabels([x[0] for x in node_ordering], rotation = 90)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# in case you aren't convinced, here is the matrix formulation\n",
    "matrix_form_katz_centrality = np.dot(np.linalg.inv(identity - katz_alpha*adj_mat), ones)\n",
    "sorted_matrix_form_katz_centrality = sorted(zip(network.vs[\"name\"], [x[0] for x in matrix_form_katz_centrality]),\n",
    "                                            key = lambda x: -x[1])\n",
    "print tabulate.tabulate([[x[0],x[1]] for x in sorted_matrix_form_katz_centrality][0:8])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is how Katz centrality would behave on the directed acyclic graph for which Eigenvector centrality failed:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# and for the tree graph case\n",
    "tree_adj_mat = np.array(tree.get_adjacency().data)\n",
    "katz_centrality_tree = np.dot(np.linalg.inv(np.eye(len(tree.vs)) - katz_alpha*tree_adj_mat), \n",
    "                              np.ones(len(tree.vs)))\n",
    "katz_centrality_tree = sorted(zip(tree.vs[\"label\"], [x for x in katz_centrality_tree]),\n",
    "                                            key = lambda x: -x[1])\n",
    "print tabulate.tabulate([[x[0],x[1]] for x in katz_centrality_tree][0:8])\n",
    "igraph.plot(tree, layout = tree.layout(\"sugiyama\"), bbox=(0, 0, 150, 150), vertex_label_size = 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In Katz centrality (similar to eigenvector centrality), if a vertex is very important, it makes the vertices that it points to important. This could be undesireable in the case where very important vertices also point to many other verties, as is the case with Google, which points to many websites, none of which are necessarily particualrly important. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## PageRank\n",
    "#### Importance can be diluted.\n",
    "We said that in eigenvector centrality, each vertex \"votes\" for the importance of each of its neighbors, one vote per neighbor. In the PageRank formulation of centrality, each vertex only gets one \"vote,\" and so if it has many neighbors, each neighbor only gets a fraction of a vote. This is achieved by dividing the centrality vector by the out degree of the vertex.\n",
    "\n",
    "$$ x_{i}' = \\alpha \\sum_j A_{ij}\\frac{x_j}{k^{out}_j} + \\beta $$\n",
    "\n",
    "I won't walk through the matrix manipulation, but the above equation is equivalent to:\n",
    "$$ \\mathbf{x} = \\mathbf{D}(\\mathbf{D} - \\alpha\\mathbf{A})^{-1}\\beta $$\n",
    "Where $\\mathbf{D}$ is the diagonal matrix with elements $D_{ii} = \\text{max}(k_i^{out},1)$ (ensuring that $\\mathbf{D}$ has no zeros means that it is invertible). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# we need the out degree with non-zero, so that we don't divide by zero\n",
    "# it's okay to simply replace zeros with ones, because the matrix multiplication will zero them out again\n",
    "outdegree_no_zeros = outdegree\n",
    "outdegree_no_zeros[outdegree_no_zeros == 0] = 1\n",
    "Degree = np.diag(outdegree_no_zeros)\n",
    "outdegree_no_zeros = outdegree_no_zeros.reshape(num_nodes,1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# iterative formulation\n",
    "pagerank = ones\n",
    "err = 100\n",
    "tol = .001\n",
    "pagerank_alpha = .85\n",
    "while err > tol:\n",
    "    pagerank_new = (pagerank_alpha * np.dot(adj_mat, np.divide(pagerank, outdegree_no_zeros))) + ones\n",
    "    err = sum(abs(np.subtract(pagerank_new, pagerank)))\n",
    "    pagerank = pagerank_new\n",
    "sorted_pagerank = sorted(zip(network.vs[\"name\"], [x/float(max(pagerank)) for x in pagerank]), key = lambda x:-x[1])\n",
    "dict_pagerank = dict(sorted_pagerank)\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# plotting code, ignore\n",
    "fig, axes = plt.subplots(2,1, figsize = (16,12))\n",
    "plt.subplots_adjust(bottom = .1)\n",
    "# plot katz centrality\n",
    "axes[0].set_title(\"PageRank Centrality\", size = 16)\n",
    "axes[0].bar([x + .4 for x in range(0,num_nodes)], [x[1] for x in sorted_pagerank], \n",
    "         color = 'purple', width = .8)\n",
    "axes[0].set_ylabel(\"PageRank Centrality Values\", size = 16)\n",
    "axes[0].set_xticks([x + .9 for x in range(0,num_nodes)])\n",
    "_ = axes[0].set_xticklabels([x[0] for x in sorted_pagerank], rotation = 90)\n",
    "# compare with other measures\n",
    "fade  = .3\n",
    "node_ordering = sorted_degree_centrality #sorted_pagerank\n",
    "axes[1].plot(range(0,num_nodes), [dict_pagerank[x[0]] for x in node_ordering], color = 'purple')\n",
    "axes[1].plot(range(0,num_nodes), [dict_katz_centrality[x[0]] for x in node_ordering], color = 'limegreen', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [dict_eigenvector_centrality[x[0]] for x in node_ordering], color = 'royalblue', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [scaled_degree[x[0]] for x in node_ordering], color = 'navajowhite', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [dict_pagerank[x[0]] for x in node_ordering], color = 'purple', marker = 'o')\n",
    "axes[1].plot(range(0,num_nodes), [dict_katz_centrality[x[0]] for x in node_ordering], color = 'limegreen', marker = 'o', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [dict_eigenvector_centrality[x[0]] for x in node_ordering], color = 'royalblue', marker = 'o', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [scaled_degree[x[0]] for x in node_ordering], color = 'navajowhite', marker = 'o', alpha = fade)\n",
    "axes[1].set_ylabel(\"PageRank vs Katz vs Eigenvector vs Degree\", size = 16)\n",
    "axes[1].set_xticks([x for x in range(0,num_nodes)])\n",
    "axes[1].grid()\n",
    "axes[1].set_xlim(-0.5,38.5)\n",
    "axes[1].set_ylim(0,1.05)\n",
    "axes[1].legend([\"pagerank centrality\", \"katz centrality\", \"eigenvector centrality\", \"degree centrality\"])\n",
    "_ = axes[1].set_xticklabels([x[0] for x in node_ordering], rotation = 90)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# matrix formulation, in case you didn't believe my hasty non-derivation\n",
    "D_minus_alpha_a = np.subtract(Degree, pagerank_alpha * adj_mat)\n",
    "D_minus_alpha_a_inv = np.linalg.inv(D_minus_alpha_a)\n",
    "pagerank_matrix_method = np.dot(np.dot(Degree, D_minus_alpha_a_inv), ones)\n",
    "print tabulate.tabulate([[x[0], x[1]] for x in \n",
    "                         sorted(zip(network.vs[\"name\"], pagerank_matrix_method), key = lambda x:-x[1])[0:8]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hubs and Authorities\n",
    "#### On a directed network, importance can have direction\n",
    "\n",
    "It is concievable that you might want to distingush vertices who point to important vertices, as well as vertices pointed to by other vertices (vertices that are pointed to corresponds to the previously discussed centrality measures). \n",
    "\n",
    "In this case, we could distinguish between vertices with a high *authority centrality* (corresponding to vertices with many important incoming connections and is quite similar to the previously discussed metrics) and vertices with high *hub centrality* (measuring how many important authority-vertices that a vertex points to).\n",
    "\n",
    "Call the authority centraliy $\\mathbf{x}$ and call the hub centrality $\\mathbf{y}$, then solve:\n",
    "$$ \\mathbf{x} = \\alpha\\mathbf{A}\\mathbf{y} $$\n",
    "$$ \\mathbf{y} = \\beta\\mathbf{A}^T\\mathbf{x} $$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "normed_ones = np.divide([1 for i in sum(adj_mat)], np.linalg.norm([1 for i in sum(adj_mat)]))\n",
    "err = 100\n",
    "tol = .001\n",
    "hub_score = normed_ones\n",
    "authority_score = normed_ones\n",
    "alpha = 1\n",
    "beta= 1\n",
    "while err > tol:\n",
    "    # find the new hub and authority scores\n",
    "    authority_score_new_unnormed = alpha * np.dot(adj_mat, hub_score)\n",
    "    hub_score_new_unnormed = beta * np.dot(np.transpose(adj_mat), authority_score)\n",
    "    # norm the scores (we only care about proportional values anyway)\n",
    "    authority_score_new = np.divide(authority_score_new_unnormed, np.linalg.norm(authority_score_new_unnormed))\n",
    "    hub_score_new = np.divide(hub_score_new_unnormed, np.linalg.norm(hub_score_new_unnormed))\n",
    "    # find error and update\n",
    "    err = sum(abs(np.subtract(hub_score_new, hub_score))) + sum(abs(np.subtract(authority_score_new, authority_score)))\n",
    "    authority_score = authority_score_new\n",
    "    hub_score = hub_score_new\n",
    "dict_authority = dict(zip(network.vs[\"name\"], [x/max(authority_score) for x in authority_score] ))\n",
    "dict_hub = dict(zip(network.vs[\"name\"], [x/max(hub_score) for x in hub_score]  ))\n",
    "sorted_authority_score = sorted(dict_authority.items(), key = lambda x: -x[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "fig, axes = plt.subplots(2,1, figsize = (16,12))\n",
    "plt.subplots_adjust(bottom = .1)\n",
    "# plot katz centrality\n",
    "axes[0].set_title(\"Hub/Authority Centrality\", size = 16)\n",
    "axes[0].bar([x + .2 for x in range(0,num_nodes)], [x[1] for x in sorted_authority_score], color = 'indianred', width = .4)\n",
    "axes[0].bar([x + .6 for x in range(0,num_nodes)], [dict_hub[x[0]] for x in sorted_authority_score], color = 'brown', width = .4)\n",
    "axes[0].set_ylabel(\"Hub/Authority Centrality Values\", size = 16)\n",
    "axes[0].set_xticks([x + .8 for x in range(0,num_nodes)])\n",
    "_ = axes[0].set_xticklabels([x[0] for x in sorted_authority_score], rotation = 90)\n",
    "axes[0].legend([\"authority score\", \"hub score\"])\n",
    "# compare with other measures\n",
    "fade = .3\n",
    "node_ordering = sorted_degree_centrality #sorted_pagerank\n",
    "axes[1].plot(range(0,num_nodes), [dict_authority[x[0]] for x in node_ordering], color = 'indianred')\n",
    "axes[1].plot(range(0,num_nodes), [dict_hub[x[0]] for x in node_ordering], color = 'brown')\n",
    "axes[1].plot(range(0,num_nodes), [dict_pagerank[x[0]] for x in node_ordering], color = 'purple', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [dict_katz_centrality[x[0]] for x in node_ordering], color = 'limegreen', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [dict_eigenvector_centrality[x[0]] for x in node_ordering], color = 'royalblue', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [scaled_degree[x[0]] for x in node_ordering], color = 'navajowhite', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [dict_authority[x[0]] for x in node_ordering], color = 'indianred', marker = 'o')\n",
    "axes[1].plot(range(0,num_nodes), [dict_hub[x[0]] for x in node_ordering], color = 'brown', marker = 'o')\n",
    "axes[1].plot(range(0,num_nodes), [dict_pagerank[x[0]] for x in node_ordering], color = 'purple', marker = 'o', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [dict_katz_centrality[x[0]] for x in node_ordering], color = 'limegreen', marker = 'o', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [dict_eigenvector_centrality[x[0]] for x in node_ordering], color = 'royalblue', marker = 'o', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [scaled_degree[x[0]] for x in node_ordering], color = 'navajowhite', marker = 'o', alpha = fade)\n",
    "axes[1].set_ylabel(\"Hub/Authority vs PageRank vs Katz vs Eigenvector vs Degree\", size = 12)\n",
    "axes[1].set_xticks([x for x in range(0,num_nodes)])\n",
    "axes[1].grid()\n",
    "axes[1].set_xlim(-0.5,38.5)\n",
    "axes[1].set_ylim(0,1.05)\n",
    "axes[1].legend([\"authority score\", \"hub score\", \n",
    "                \"pagerank centrality\", \"katz centrality\", \"eigenvector centrality\", \"degree centrality\"])\n",
    "_ = axes[1].set_xticklabels([x[0] for x in node_ordering], rotation = 90)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Let's look at how hub and authority scores are correlated\n",
    "auth_hub = [(y[0][0], y[0][1], y[1]) \n",
    "            for y in zip(sorted_authority_score,[dict_hub[x[0]] for x in sorted_authority_score])]\n",
    "plt.figure(figsize = (10,8))\n",
    "plt.plot([x[1] for x in auth_hub], [x[2] for x in auth_hub], 'o', color = 'indianred')\n",
    "plt.plot([0,1],[0,1], 'k--')\n",
    "plt.ylim(-.05,1.05)\n",
    "plt.xlim(-.05,1.05)\n",
    "for label, x, y in auth_hub:\n",
    "    plt.annotate(label, (x - .02,y + .011), size = 8)\n",
    "plt.grid()\n",
    "plt.title(\"Authority score vs hub score\", size = 16)\n",
    "plt.ylabel(\"Authority Score\", size = 16)\n",
    "_ = plt.xlabel(\"Hub Score\", size = 16)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Distance-based centrality metrics\n",
    "\n",
    "So far, we have formulated cetrality based on the adjacency matrix. A different way of thinking about centrality is to think about the message-passing ability of each node--how far and how quickly can it pass a message across the network? How imporant is it to the flow of information across the network?\n",
    "\n",
    "This notion of centrality is based on the distance from a vertex to other vertices on the graph."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# it doesn't make sense to calculate these metrics on components that aren't connected, so I'll get the \n",
    "# stringly connected component to use in the next few metrics\n",
    "connected_components = network.clusters(mode = 'STRONG')\n",
    "giant_component = max(zip([len(c) for c in connected_components], connected_components), key = lambda x:x[0])[1]\n",
    "connected_subnetwork = network.subgraph(giant_component)\n",
    "unconnected_nodes = list(set(network.vs[\"name\"]) - set(connected_subnetwork.vs[\"name\"]))\n",
    "connected_subnetwork.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "num_connected_nodes = len(connected_subnetwork.vs)\n",
    "adj_mat_connected_subnetwork = np.array(connected_subnetwork.get_adjacency().data)\n",
    "outdegree_connected_subnetwork = np.sum(adj_mat,1)\n",
    "indegree_connected_subnetwork = np.sum(adj_mat,0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Closeness Centrality\n",
    "#### How close is a vertex to the rest of the graph?\n",
    "Closeness centrality measures the average geodesic (shortest) distance between a vertex and every other vertex on the network.\n",
    "\n",
    "Because the average distance would be small for the most central node and large for the least central, we take the inverse of the average distance in order to create a centrality score. \n",
    "\n",
    "$d_{ij}$ denotes the distance between vertices $i$ and $j$, $n$ denotes the total number of vertices in the graph:\n",
    "\n",
    "$$ C_i = \\frac{n}{\\sum_j d_{ij}} $$ "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "closeness_centrality = []\n",
    "for vertex in connected_subnetwork.vs:\n",
    "    closeness_centrality.append(num_connected_nodes / float(sum(connected_subnetwork.shortest_paths(vertex)[0])))\n",
    "dict_closeness_centrality = defaultdict(float)\n",
    "for i in range(0, num_connected_nodes):\n",
    "    dict_closeness_centrality[connected_subnetwork.vs[\"name\"][i]] = closeness_centrality[i]/max(closeness_centrality)\n",
    "for unconnected_node in unconnected_nodes:\n",
    "    dict_closeness_centrality[unconnected_node] = 0\n",
    "sorted_closeness = sorted(dict_closeness_centrality.items(), key = lambda x: -x[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "fig, axes = plt.subplots(2,1, figsize = (16,12))\n",
    "plt.subplots_adjust(bottom = .1)\n",
    "# plot centrality\n",
    "axes[0].set_title(\"Closeness Centrality\", size = 16)\n",
    "axes[0].bar([x + .4 for x in range(0,num_nodes)], [x[1] for x in sorted_closeness], \n",
    "         color = 'dimgray', width = .8)\n",
    "axes[0].set_ylabel(\"Closeness Centrality Values\", size = 16)\n",
    "axes[0].set_xticks([x + .8 for x in range(0,num_nodes)])\n",
    "_ = axes[0].set_xticklabels([x[0] for x in sorted_closeness], rotation = 90)\n",
    "# compare with other measures\n",
    "fade = .3\n",
    "node_ordering = sorted_degree_centrality #sorted_pagerank\n",
    "axes[1].plot(range(0,num_nodes), [dict_closeness_centrality[x[0]] for x in node_ordering], color = 'dimgray')\n",
    "axes[1].plot(range(0,num_nodes), [dict_authority[x[0]] for x in node_ordering], color = 'indianred', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [dict_hub[x[0]] for x in node_ordering], color = 'brown', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [dict_pagerank[x[0]] for x in node_ordering], color = 'purple', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [dict_katz_centrality[x[0]] for x in node_ordering], color = 'limegreen', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [dict_eigenvector_centrality[x[0]] for x in node_ordering], color = 'royalblue', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [scaled_degree[x[0]] for x in node_ordering], color = 'navajowhite', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [dict_closeness_centrality[x[0]] for x in node_ordering], color = 'dimgray', marker = 'o')\n",
    "axes[1].plot(range(0,num_nodes), [dict_authority[x[0]] for x in node_ordering], color = 'indianred', marker = 'o', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [dict_hub[x[0]] for x in node_ordering], color = 'brown', marker = 'o', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [dict_pagerank[x[0]] for x in node_ordering], color = 'purple', marker = 'o', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [dict_katz_centrality[x[0]] for x in node_ordering], color = 'limegreen', marker = 'o', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [dict_eigenvector_centrality[x[0]] for x in node_ordering], color = 'royalblue', marker = 'o', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [scaled_degree[x[0]] for x in node_ordering], color = 'navajowhite', marker = 'o', alpha = fade)\n",
    "axes[1].set_ylabel(\"PageRank vs Katz vs Eigenvector vs Degree\", size = 16)\n",
    "axes[1].set_xticks([x for x in range(0,num_nodes)])\n",
    "axes[1].grid()\n",
    "axes[1].set_xlim(-0.5,38.5)\n",
    "axes[1].set_ylim(0,1.05)\n",
    "axes[1].legend([\"closeness centrality\", \"authority score\", \"hub score\", \n",
    "                \"pagerank centrality\", \"katz centrality\", \"eigenvector centrality\", \"degree centrality\"])\n",
    "_ = axes[1].set_xticklabels([x[0] for x in node_ordering], rotation = 90)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You might notice (by looking at the histogram) that the actual range of values for the closeness centrality score is quite low, and the difference between the most central vertices is quite small. If you remember a previous discussion on small-diameter social graphs, this is because the average distance between any two nodes on a random graph tends to be both reatively constant and quite low. \n",
    "\n",
    "This lack of range is a problem with the closeness centrality score, as it makes the ranking sensitive to small perturbations in the structure of the network."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Betweenness Centrality\n",
    "#### How often do we pass through this vertex when traveling to the rest of the graph? \n",
    "\n",
    "Example: London would have a high betweenness centrality in a map of airplane routes from the US to Europe. You often travel through London to get to Europe.\n",
    "\n",
    "A vertex with high betweennness centrality is one that connects many disparate parts of the network (indeed, one algorithm to find communitites in a network iteratively cuts the network on vertices with a high betweenness centrality). \n",
    "\n",
    "The most common formulation of betweenness centrality counts the number of times a certain vertex $i$ appears on the shortest path between every pair of verties on the network ($s,t$). In the case where there are multiple shortest paths between $s$ and $t$, we count the fraction of times that $i$ appears on those paths.\n",
    "\n",
    "Define $n^i_{s,t} = 1$ if $i$ appears on the geodesic (shortest) path between two vertices $s$ and $t$, and define $g_{s,t}$ as the total number of geodesic paths between vertices $s$ and $t$.\n",
    "\n",
    "$$ x_i = \\sum_{s,t}\\frac{n^i_{s,t}}{g_{s,t}} $$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "shortest_paths_all_pairs = {}\n",
    "for vertex in range(0,num_connected_nodes):\n",
    "    for dest in range(0,num_connected_nodes):\n",
    "        shortest_paths_all_pairs[(vertex, dest)] = connected_subnetwork.get_all_shortest_paths(vertex, to = dest)\n",
    "\n",
    "betweenness_sums = defaultdict(int)\n",
    "for v in range(0,num_connected_nodes):\n",
    "    for i in range(0,num_connected_nodes):\n",
    "        for j in range(0,num_connected_nodes):\n",
    "            n_v_vector = [int(v in path) for path in shortest_paths_all_pairs[(i,j)]]\n",
    "            betweenness_sums[v] += float(sum(n_v_vector))/float(len(n_v_vector))\n",
    "\n",
    "betweenness = [x[1] for x in sorted(betweenness_sums.items(), key = lambda x:x[0])]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print tabulate.tabulate([[x[0], x[1]] for x in \n",
    "                   sorted(zip(connected_subnetwork.vs[\"name\"], betweenness), key = lambda x: -x[1])][0:8])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### An alternative to the shortest distance formulation\n",
    "\n",
    "The above formulation of betweenness centrality assumes that information (or people, planes, etc) travels along the most efficient (shortest) path on the network. This is by far the most common formulation, and in gerneral, it's what scientists mean when they say \"betweenness centrality.\" However, for some kinds of networks it's more helpful to assume a different sort of information diffusion.\n",
    "\n",
    "One alternative (of many) is to instead count the average number of times that some vertex $i$ on a random walk between vertices $s$ and $t$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# random walk betweenness\n",
    "iterations = 50\n",
    "give_up_after = 2*num_connected_nodes\n",
    "paths = defaultdict(list)\n",
    "# random walk between every pair of vertices\n",
    "for vertex in range(0,num_connected_nodes):\n",
    "    for dest in range(0,num_connected_nodes):\n",
    "        for i in range(0,iterations):\n",
    "            # start at the first vertex\n",
    "            current_vertex = vertex\n",
    "            path = [current_vertex]\n",
    "            # now wander around\n",
    "            steps = 0\n",
    "            while (current_vertex != dest) and (steps < give_up_after):\n",
    "                steps += 1\n",
    "                # choose a random vertex to walk to \n",
    "                options = np.where(adj_mat_connected_subnetwork[vertex] != 0)[0]\n",
    "                # dead end\n",
    "                if options.size == 0:\n",
    "                    path = []\n",
    "                    break\n",
    "                else:\n",
    "                    current_vertex = np.random.choice(options)\n",
    "                    path.append(current_vertex)\n",
    "            paths[(vertex,dest)].append(path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "flow_betweenness_sums = defaultdict(int)\n",
    "for v in range(0,num_connected_nodes):\n",
    "    for i in range(0,num_connected_nodes):\n",
    "        for j in range(0,num_connected_nodes):\n",
    "            n_v_vector = [int(v) in path for path in paths[(i,j)]]\n",
    "            flow_betweenness_sums[v] += float(sum(n_v_vector))/float(len(n_v_vector))\n",
    "flow_betweenness = [x[1] for x in sorted(flow_betweenness_sums.items(), key = lambda x:x[0])]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "dict_betweenness = dict(zip(connected_subnetwork.vs[\"name\"], [x/max(betweenness) for x in betweenness]))\n",
    "dict_flow_betweenness = dict(zip(connected_subnetwork.vs[\"name\"], [x/max(flow_betweenness) for x in flow_betweenness]))\n",
    "for unconnected_node in unconnected_nodes:\n",
    "    dict_betweenness[unconnected_node] = 0\n",
    "    dict_flow_betweenness[unconnected_node] = 0\n",
    "sorted_flow_betweenness = sorted(dict_flow_betweenness.items(), key = lambda x: -x[1])\n",
    "sorted_betweenness = sorted(dict_betweenness.items(), key = lambda x: -x[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print tabulate.tabulate([[x[0], x[1]] for x in \n",
    "                   sorted(zip(connected_subnetwork.vs[\"name\"], flow_betweenness), key = lambda x: -x[1])][0:8])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "fig, axes = plt.subplots(2,1, figsize = (16,12))\n",
    "plt.subplots_adjust(bottom = .1)\n",
    "# plot centrality\n",
    "axes[0].set_title(\"Betweenness Centrality\", size = 16)\n",
    "axes[0].bar([x + .2 for x in range(0,num_nodes)], [x[1] for x in sorted_betweenness], color = 'cyan', width = .4)\n",
    "axes[0].bar([x + .6 for x in range(0,num_nodes)], [dict_flow_betweenness[x[0]] for x in sorted_betweenness], color = 'darkcyan', width = .4)\n",
    "axes[0].set_ylabel(\"Betweenness Centrality Values\", size = 16)\n",
    "axes[0].set_xticks([x + .8 for x in range(0,num_nodes)])\n",
    "_ = axes[0].set_xticklabels([x[0] for x in sorted_betweenness], rotation = 90)\n",
    "axes[0].legend([\"betweenness by geodesic distances\", \"betweenness by random walk\"])\n",
    "# compare with other measures\n",
    "fade = .3\n",
    "node_ordering = sorted_degree_centrality #sorted_pagerank\n",
    "axes[1].plot(range(0,num_nodes), [dict_betweenness[x[0]] for x in node_ordering], color = 'cyan')\n",
    "axes[1].plot(range(0,num_nodes), [dict_flow_betweenness[x[0]] for x in node_ordering], color = 'darkcyan')\n",
    "axes[1].plot(range(0,num_nodes), [dict_closeness_centrality[x[0]] for x in node_ordering], color = 'dimgray', alpha = fade)\n",
    "#axes[1].plot(range(0,num_nodes), [dict_authority[x[0]] for x in node_ordering], color = 'indianred', alpha = fade)\n",
    "#axes[1].plot(range(0,num_nodes), [dict_hub[x[0]] for x in node_ordering], color = 'brown', alpha = fade)\n",
    "#axes[1].plot(range(0,num_nodes), [dict_pagerank[x[0]] for x in node_ordering], color = 'purple', alpha = fade)\n",
    "#axes[1].plot(range(0,num_nodes), [dict_katz_centrality[x[0]] for x in node_ordering], color = 'limegreen', alpha = fade)\n",
    "#axes[1].plot(range(0,num_nodes), [dict_eigenvector_centrality[x[0]] for x in node_ordering], color = 'royalblue', alpha = fade)\n",
    "#axes[1].plot(range(0,num_nodes), [scaled_degree[x[0]] for x in node_ordering], color = 'navajowhite', alpha = fade)\n",
    "axes[1].plot(range(0,num_nodes), [dict_betweenness[x[0]] for x in node_ordering], color = 'cyan', marker = 'o')\n",
    "axes[1].plot(range(0,num_nodes), [dict_flow_betweenness[x[0]] for x in node_ordering], color = 'darkcyan', marker = 'o')\n",
    "axes[1].plot(range(0,num_nodes), [dict_closeness_centrality[x[0]] for x in node_ordering], color = 'dimgray', marker = 'o', alpha = fade)\n",
    "#axes[1].plot(range(0,num_nodes), [dict_authority[x[0]] for x in node_ordering], color = 'indianred', marker = 'o', alpha = fade)\n",
    "#axes[1].plot(range(0,num_nodes), [dict_hub[x[0]] for x in node_ordering], color = 'brown', marker = 'o', alpha = fade)\n",
    "#axes[1].plot(range(0,num_nodes), [dict_pagerank[x[0]] for x in node_ordering], color = 'purple', marker = 'o', alpha = fade)\n",
    "#axes[1].plot(range(0,num_nodes), [dict_katz_centrality[x[0]] for x in node_ordering], color = 'limegreen', marker = 'o', alpha = fade)\n",
    "#axes[1].plot(range(0,num_nodes), [dict_eigenvector_centrality[x[0]] for x in node_ordering], color = 'royalblue', marker = 'o', alpha = fade)\n",
    "#axes[1].plot(range(0,num_nodes), [scaled_degree[x[0]] for x in node_ordering], color = 'navajowhite', marker = 'o', alpha = fade)\n",
    "axes[1].set_ylabel(\"Betweenness vs PageRank vs Katz vs Eigenvector vs Degree\", size = 12)\n",
    "axes[1].set_xticks([x for x in range(0,num_nodes)])\n",
    "axes[1].grid()\n",
    "axes[1].set_xlim(-0.5,38.5)\n",
    "axes[1].set_ylim(0,1.05)\n",
    "axes[1].legend([\"betweenness\", \"flow betweenness\", \"closeness centrality\"])#, \"authority score\", \"hub score\", \"pagerank centrality\", \"katz centrality\", \"eigenvector centrality\", \"degree centrality\"])\n",
    "_ = axes[1].set_xticklabels([x[0] for x in node_ordering], rotation = 90)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## That's all folks!\n",
    "\n",
    "I'd encourage you to mess around with the plotting and these metrics to try to figure out which centrality measures are more or less correlated. I'd also look at how these play with different (potentially larger) networks or real social network data."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
